44 research outputs found

    Sampling Random Spanning Trees Faster than Matrix Multiplication

    Full text link
    We present an algorithm that, with high probability, generates a random spanning tree from an edge-weighted undirected graph in O~(n4/3m1/2+n2)\tilde{O}(n^{4/3}m^{1/2}+n^{2}) time (The O~()\tilde{O}(\cdot) notation hides polylog(n)\operatorname{polylog}(n) factors). The tree is sampled from a distribution where the probability of each tree is proportional to the product of its edge weights. This improves upon the previous best algorithm due to Colbourn et al. that runs in matrix multiplication time, O(nω)O(n^\omega). For the special case of unweighted graphs, this improves upon the best previously known running time of O~(min{nω,mn,m4/3})\tilde{O}(\min\{n^{\omega},m\sqrt{n},m^{4/3}\}) for mn5/3m \gg n^{5/3} (Colbourn et al. '96, Kelner-Madry '09, Madry et al. '15). The effective resistance metric is essential to our algorithm, as in the work of Madry et al., but we eschew determinant-based and random walk-based techniques used by previous algorithms. Instead, our algorithm is based on Gaussian elimination, and the fact that effective resistance is preserved in the graph resulting from eliminating a subset of vertices (called a Schur complement). As part of our algorithm, we show how to compute ϵ\epsilon-approximate effective resistances for a set SS of vertex pairs via approximate Schur complements in O~(m+(n+S)ϵ2)\tilde{O}(m+(n + |S|)\epsilon^{-2}) time, without using the Johnson-Lindenstrauss lemma which requires O~(min{(m+S)ϵ2,m+nϵ4+Sϵ2})\tilde{O}( \min\{(m + |S|)\epsilon^{-2}, m+n\epsilon^{-4} +|S|\epsilon^{-2}\}) time. We combine this approximation procedure with an error correction procedure for handing edges where our estimate isn't sufficiently accurate

    Solving Directed Laplacian Systems in Nearly-Linear Time through Sparse LU Factorizations

    Full text link
    We show how to solve directed Laplacian systems in nearly-linear time. Given a linear system in an n×nn \times n Eulerian directed Laplacian with mm nonzero entries, we show how to compute an ϵ\epsilon-approximate solution in time O(mlogO(1)(n)log(1/ϵ))O(m \log^{O(1)} (n) \log (1/\epsilon)). Through reductions from [Cohen et al. FOCS'16] , this gives the first nearly-linear time algorithms for computing ϵ\epsilon-approximate solutions to row or column diagonally dominant linear systems (including arbitrary directed Laplacians) and computing ϵ\epsilon-approximations to various properties of random walks on directed graphs, including stationary distributions, personalized PageRank vectors, hitting times, and escape probabilities. These bounds improve upon the recent almost-linear algorithms of [Cohen et al. STOC'17], which gave an algorithm to solve Eulerian Laplacian systems in time O((m+n2O(lognloglogn))logO(1)(nϵ1))O((m+n2^{O(\sqrt{\log n \log \log n})})\log^{O(1)}(n \epsilon^{-1})). To achieve our results, we provide a structural result that we believe is of independent interest. We show that Laplacians of all strongly connected directed graphs have sparse approximate LU-factorizations. That is, for every such directed Laplacian L {\mathbf{L}}, there is a lower triangular matrix L\boldsymbol{\mathit{{\mathfrak{L}}}} and an upper triangular matrix U\boldsymbol{\mathit{{\mathfrak{U}}}}, each with at most O~(n)\tilde{O}(n) nonzero entries, such that their product LU\boldsymbol{\mathit{{\mathfrak{L}}}} \boldsymbol{\mathit{{\mathfrak{U}}}} spectrally approximates L {\mathbf{L}} in an appropriate norm. This claim can be viewed as an analogue of recent work on sparse Cholesky factorizations of Laplacians of undirected graphs. We show how to construct such factorizations in nearly-linear time and prove that, once constructed, they yield nearly-linear time algorithms for solving directed Laplacian systems.Comment: Appeared in FOCS 201

    Optimal Sketching Bounds for Sparse Linear Regression

    Full text link
    We study oblivious sketching for kk-sparse linear regression under various loss functions such as an p\ell_p norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse 2\ell_2 norm regression, there is a distribution over oblivious sketches with Θ(klog(d/k)/ε2)\Theta(k\log(d/k)/\varepsilon^2) rows, which is tight up to a constant factor. This extends to p\ell_p loss with an additional additive O(klog(k/ε)/ε2)O(k\log(k/\varepsilon)/\varepsilon^2) term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the 2\ell_2 norm, we observe an upper bound of O(klog(d)/ε+klog(k/ε)/ε2)O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2) rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve o(d)o(d) rows showing that O(μ2klog(μnd/ε)/ε2)O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2) rows suffice, where μ\mu is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on μ\mu. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize Axb22+λx1\|Ax-b\|_2^2+\lambda\|x\|_1 over xRdx\in\mathbb{R}^d. We show that sketching dimension O(log(d)/(λε)2)O(\log(d)/(\lambda \varepsilon)^2) suffices and that the dependence on dd and λ\lambda is tight.Comment: AISTATS 202

    Efficient Second-Order Shape-Constrained Function Fitting

    Get PDF
    We give an algorithm to compute a one-dimensional shape-constrained function that best fits given data in weighted-LL_{\infty} norm. We give a single algorithm that works for a variety of commonly studied shape constraints including monotonicity, Lipschitz-continuity and convexity, and more generally, any shape constraint expressible by bounds on first- and/or second-order differences. Our algorithm computes an approximation with additive error ε\varepsilon in O(nlogUε)O\left(n \log \frac{U}{\varepsilon} \right) time, where UU captures the range of input values. We also give a simple greedy algorithm that runs in O(n)O(n) time for the special case of unweighted LL_{\infty} convex regression. These are the first (near-)linear-time algorithms for second-order-constrained function fitting. To achieve these results, we use a novel geometric interpretation of the underlying dynamic programming problem. We further show that a generalization of the corresponding problems to directed acyclic graphs (DAGs) is as difficult as linear programming

    The Cosmos of a Public Sector Township: Democracy as an Intellectual Culture

    Full text link
    The public sector plays an important role in responding to the rights of citizens and evolving norms of social interest (Qu 2015). Qu argues that the nature of public enterprise is never final and there is a constant negotiation between the private and the public emergence of life and rights. One such space where the tension between the private and the public manifests itself is the public sector township or the residential colony in India. The sociality of hierarchy in public sector organizations manifest itself in the public sector township and may nurture everyday aspirations, angsts and divides. The officer lives in a bigger hone, in a bungalow, and the clerk lives in a smaller home, many times with a larger family. [excerpt

    Multiple novel prostate cancer susceptibility signals identified by fine-mapping of known risk loci among Europeans

    Get PDF
    Genome-wide association studies (GWAS) have identified numerous common prostate cancer (PrCa) susceptibility loci. We have fine-mapped 64 GWAS regions known at the conclusion of the iCOGS study using large-scale genotyping and imputation in 25 723 PrCa cases and 26 274 controls of European ancestry. We detected evidence for multiple independent signals at 16 regions, 12 of which contained additional newly identified significant associations. A single signal comprising a spectrum of correlated variation was observed at 39 regions; 35 of which are now described by a novel more significantly associated lead SNP, while the originally reported variant remained as the lead SNP only in 4 regions. We also confirmed two association signals in Europeans that had been previously reported only in East-Asian GWAS. Based on statistical evidence and linkage disequilibrium (LD) structure, we have curated and narrowed down the list of the most likely candidate causal variants for each region. Functional annotation using data from ENCODE filtered for PrCa cell lines and eQTL analysis demonstrated significant enrichment for overlap with bio-features within this set. By incorporating the novel risk variants identified here alongside the refined data for existing association signals, we estimate that these loci now explain ∼38.9% of the familial relative risk of PrCa, an 8.9% improvement over the previously reported GWAS tag SNPs. This suggests that a significant fraction of the heritability of PrCa may have been hidden during the discovery phase of GWAS, in particular due to the presence of multiple independent signals within the same regio

    Repositioning of the global epicentre of non-optimal cholesterol

    Get PDF
    High blood cholesterol is typically considered a feature of wealthy western countries(1,2). However, dietary and behavioural determinants of blood cholesterol are changing rapidly throughout the world(3) and countries are using lipid-lowering medications at varying rates. These changes can have distinct effects on the levels of high-density lipoprotein (HDL) cholesterol and non-HDL cholesterol, which have different effects on human health(4,5). However, the trends of HDL and non-HDL cholesterol levels over time have not been previously reported in a global analysis. Here we pooled 1,127 population-based studies that measured blood lipids in 102.6 million individuals aged 18 years and older to estimate trends from 1980 to 2018 in mean total, non-HDL and HDL cholesterol levels for 200 countries. Globally, there was little change in total or non-HDL cholesterol from 1980 to 2018. This was a net effect of increases in low- and middle-income countries, especially in east and southeast Asia, and decreases in high-income western countries, especially those in northwestern Europe, and in central and eastern Europe. As a result, countries with the highest level of non-HDL cholesterol-which is a marker of cardiovascular riskchanged from those in western Europe such as Belgium, Finland, Greenland, Iceland, Norway, Sweden, Switzerland and Malta in 1980 to those in Asia and the Pacific, such as Tokelau, Malaysia, The Philippines and Thailand. In 2017, high non-HDL cholesterol was responsible for an estimated 3.9 million (95% credible interval 3.7 million-4.2 million) worldwide deaths, half of which occurred in east, southeast and south Asia. The global repositioning of lipid-related risk, with non-optimal cholesterol shifting from a distinct feature of high-income countries in northwestern Europe, north America and Australasia to one that affects countries in east and southeast Asia and Oceania should motivate the use of population-based policies and personal interventions to improve nutrition and enhance access to treatment throughout the world.Peer reviewe

    Global variations in diabetes mellitus based on fasting glucose and haemogloblin A1c

    Get PDF
    Fasting plasma glucose (FPG) and haemoglobin A1c (HbA1c) are both used to diagnose diabetes, but may identify different people as having diabetes. We used data from 117 population-based studies and quantified, in different world regions, the prevalence of diagnosed diabetes, and whether those who were previously undiagnosed and detected as having diabetes in survey screening had elevated FPG, HbA1c, or both. We developed prediction equations for estimating the probability that a person without previously diagnosed diabetes, and at a specific level of FPG, had elevated HbA1c, and vice versa. The age-standardised proportion of diabetes that was previously undiagnosed, and detected in survey screening, ranged from 30% in the high-income western region to 66% in south Asia. Among those with screen-detected diabetes with either test, the agestandardised proportion who had elevated levels of both FPG and HbA1c was 29-39% across regions; the remainder had discordant elevation of FPG or HbA1c. In most low- and middle-income regions, isolated elevated HbA1c more common than isolated elevated FPG. In these regions, the use of FPG alone may delay diabetes diagnosis and underestimate diabetes prevalence. Our prediction equations help allocate finite resources for measuring HbA1c to reduce the global gap in diabetes diagnosis and surveillance.peer-reviewe
    corecore